mathgeekfandomcom-20200214-history
Box-and-Whisker Plots: Quartiles, Boxes, and Whiskers
Statistics assumes that your data points (the numbers in your list) are clustered around some central value. The "box" in the box-and-whisker plot contains, and thereby highlights, the middle half of these data points. To create a box-and-whisker plot, you start by ordering your data (putting the values in numerical order), if they aren't ordered already. Then you find the median of your data. The median divides the data into two halves. To divide the data into quarters, you then find the medians of these two halves. Note: If you have an even number of values, so the first median was the average of the two middle values, then you include the middle values in your sub-median computations. If you have an odd number of values, so the first median was an actual data point, then you do not include that value in your sub-median computations. That is, to find the sub-medians, you're only looking at the values that haven't yet been used. You have three points: the first middle point (the median), and the middle points of the two halves (what I call the "sub-medians"). These three points divide the entire data set into quarters, called "quartiles". The top point of each quartile has a name, being a "Q" followed by the number of the quarter. So the top point of the first quarter of the data points is "Q1", and so forth. Note that Q1 is also the middle number for the first half of the list, Q2 is also the middle number for the whole list, Q3 is the middle number for the second half of the list, and Q4 is the largest value in the list. Once you have these three points, Q1, Q2, and Q3, you have all you need in order to draw a simple box-and-whisker plot. Here's an example of how it works. Draw a box-and-whisker plot for the following data set: 4.3, 5.1, 3.9, 4.5, 4.4, 4.9, 5.0, 4.7, 4.1, 4.6, 4.4, 4.3, 4.8, 4.4, 4.2, 4.5, 4.4 My first step is to order the set. This gives me: 3.9, 4.1, 4.2, 4.3, 4.3, 4.4, 4.4, 4.4, 4.4, 4.5, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1 The first number I need is the median of the entire set. Since there are seventeen values in this list, I need the ninth value: 3.9, 4.1, 4.2, 4.3, 4.3, 4.4, 4.4, 4.4, 4.4, 4.5, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1 The median is Q2 = 4.4. The next two numbers I need are the medians of the two halves. Since I used the "4.4" in the middle of the list, I can't re-use it, so my two remaining data sets are: 3.9, 4.1, 4.2, 4.3, 4.3, 4.4, 4.4, 4.4 and 4.5, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1 The first half has eight values, so the median is the average of the middle two: Q1 = (4.3 + 4.3)/2 = 4.3 The median of the second half is: Copyright © Elizabeth Stapel 2004-2011 All Rights Reserved Q3 = (4.7 + 4.8)/2 = 4.75 Since my list values have one decimal place and range from 3.9 to 5.1, I won't use a scale of, say, zero to ten, marked off by ones. Instead, I'll draw a number line from 3.5 to 5.5, and mark off by tenths. my number line Now I'll mark off the minimum and maximum values, and Q1, Q2, and Q3: min, Q1, median, Q3, and max points marked off The "box" part of the plot goes from Q1 to Q3: drawing the 'box' And then the "whiskers" are drawn to the endpoints: drawing the 'whiskers' By the way, box-and-whisker plots don't have to be drawn horizontally as I did above; they can be vertical, too. Category:Random Category:Probability